suppressMessages(library(magrittr))
suppressMessages(library(ggplot2))
suppressMessages(library(plotly))
suppressMessages(library(webshot))
gapminder <- read.csv("../../../../data/gapminder-FiveYearData.csv")
gapminder.gg <- ggplot(gapminder) +
geom_line(aes(x = year, y = lifeExp, color = country),
alpha = .85, stat = "smooth", method = "loess") +
facet_grid(~continent, scales = "free") +
theme(legend.position = "none",
axis.text.x = element_text(angle = 60, hjust = 1),
plot.margin = margin(0,0,50,10))
ggplotly(gapminder.gg)
In order to make the GG plot look readable the legend needs to be temporarily hidden away because there are so many different countries in the data set. However since this is an html document you can use the plotly package to make the graph interactive. At a quick glance there definately seems to be a large difference in life expectancy over the years on a country to country basis and a quick LM will go on and show this. However printing the full thing will take a lot of space consdering that there are 142 different countries so here is just a quick glance at the Asian Countries
gapminder.asia.lm <- lm(lifeExp ~ country,
data = gapminder[gapminder$continent == "Asia",])
library(pander)
pander(summary(gapminder.asia.lm))
| Â | Estimate | Std. Error | t value | Pr(>|t|) |
|---|---|---|---|---|
| (Intercept) | 37.48 | 2.525 | 14.84 | 2.749e-39 |
| countryBahrain | 28.13 | 3.571 | 7.877 | 3.928e-14 |
| countryBangladesh | 12.36 | 3.571 | 3.46 | 0.000604 |
| countryCambodia | 10.42 | 3.571 | 2.919 | 0.003727 |
| countryChina | 24.31 | 3.571 | 6.807 | 4.132e-11 |
| countryHong Kong China | 36.01 | 3.571 | 10.09 | 2.979e-21 |
| countryIndia | 15.69 | 3.571 | 4.393 | 1.467e-05 |
| countryIndonesia | 16.86 | 3.571 | 4.721 | 3.359e-06 |
| countryIran | 21.16 | 3.571 | 5.925 | 7.242e-09 |
| countryIraq | 19.1 | 3.571 | 5.35 | 1.562e-07 |
| countryIsrael | 36.17 | 3.571 | 10.13 | 2.118e-21 |
| countryJapan | 37.35 | 3.571 | 10.46 | 1.481e-22 |
| countryJordan | 22.31 | 3.571 | 6.247 | 1.169e-09 |
| countryKorea Dem. Rep. | 26.13 | 3.571 | 7.317 | 1.636e-12 |
| countryKorea Rep. | 27.52 | 3.571 | 7.708 | 1.239e-13 |
| countryKuwait | 31.44 | 3.571 | 8.806 | 5.398e-17 |
| countryLebanon | 28.39 | 3.571 | 7.95 | 2.385e-14 |
| countryMalaysia | 26.8 | 3.571 | 7.506 | 4.766e-13 |
| countryMongolia | 18.41 | 3.571 | 5.156 | 4.15e-07 |
| countryMyanmar | 15.84 | 3.571 | 4.437 | 1.212e-05 |
| countryNepal | 11.51 | 3.571 | 3.223 | 0.001385 |
| countryOman | 20.96 | 3.571 | 5.871 | 9.779e-09 |
| countryPakistan | 17.4 | 3.571 | 4.874 | 1.638e-06 |
| countryPhilippines | 23.49 | 3.571 | 6.578 | 1.665e-10 |
| countrySaudi Arabia | 21.2 | 3.571 | 5.937 | 6.783e-09 |
| countrySingapore | 33.74 | 3.571 | 9.45 | 4.3e-19 |
| countrySri Lanka | 29.05 | 3.571 | 8.135 | 6.622e-15 |
| countrySyria | 23.87 | 3.571 | 6.684 | 8.768e-11 |
| countryTaiwan | 32.86 | 3.571 | 9.202 | 2.825e-18 |
| countryThailand | 24.72 | 3.571 | 6.923 | 2.01e-11 |
| countryVietnam | 20 | 3.571 | 5.601 | 4.208e-08 |
| countryWest Bank and Gaza | 22.85 | 3.571 | 6.399 | 4.822e-10 |
| countryYemen Rep. | 9.302 | 3.571 | 2.605 | 0.009566 |
| Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
|---|---|---|---|
| 396 | 8.746 | 0.5006 | 0.4566 |
Lets take another look at life expectancy by continent per year.
continent.gg <- ggplot(gapminder) +
geom_boxplot(aes(x = continent, y = lifeExp), width = 1) +
facet_wrap(~year, ncol = 3) +
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.margin = margin(0,0,45,10))
ggplotly(continent.gg)
On a Continent by continental basis its very clear that life expectancy differs greatly. Lets investigate just a bit more closely For Asia.
asian.lifeExp.gg <- ggplot(gapminder[gapminder$continent == "Asia",]) +
geom_boxplot(aes(x = country, y = lifeExp), width = 1) +
theme(axis.text.x = element_text(angle = 80, hjust = 1),
plot.margin = margin(0,0,45,10))
ggplotly(asian.lifeExp.gg)
Now that its clear that there is a Difference of life expectancy by country lets investigate how gdpPercap influences this.
gpdPercap.bycountry.gg <- ggplot(gapminder, aes(x = year, y = gdpPercap, color = country)) + geom_point() +
geom_line(stat = "smooth", method = "loess") +
facet_wrap(~continent, ncol = 1, scales = "free") +
theme(legend.position = "none",
plot.margin = margin(0,0,50,50))
ggplotly(gpdPercap.bycountry.gg)
gapminder$gdpQuartile <- gapminder %>% group_by(year) %$% factor(dplyr::ntile(gdpPercap, 4))
gdpLifeExp.gg <- ggplot(gapminder, aes(x = as.factor(year), y = lifeExp, color = gdpQuartile)) +
geom_boxplot(alpha = .5, fill = NA) +
geom_line(aes(group = gdpQuartile), stat = "smooth", method = "loess", alpha = .5) +
facet_wrap(~continent, ncol = 1, scale = "free") +
theme(legend.position = "none",
plot.margin = margin(0,50,25,50))
ggplotly(gdpLifeExp.gg)
The things thats the most interesting about these graphs is before creating them I broke down gdpQuartile by year. And you can see that in some years several continents didn’t even have either the highest or lowest quartiles of gdp but regardless of that there is a clear trend that higher gdpPer Is having some effects on increased life expectancy. Countries are of course a major part of it but there are clear gaps between each level of gdpQuartile
On a country by country basis gdpPer or GDP Per Capita does show that the life expectancy of an indivual relates directly to the wealth of your nation. Visualizing the data and creating some linear models for this helps to show that in a way thats more easily digestable for people. None of my visualization models or graphs take pop into consideration but using all of the other variables together the effects of wealth and location have on life expectancy is quite clear.
Some trickery has gone into the plotting of the graphs. And that will be covered in the separate R file for the walkthrough of knitting this one.